Textual Paraphrase Dataset for Deep Language Modelling

نویسندگان

چکیده

Abstract The Turku Paraphrase Corpus is a dataset of over 100,000 Finnish paraphrase pairs. During the corpus creation, we strived to gather challenging pairs, more suitable test capabilities natural language understanding models. paraphrases are both selected and classified manually, so as minimise lexical overlap, provide examples that structurally lexically different maximum extent. An important distinguishing feature most pairs extracted distributed in their native document context, rather than isolation. primary application for development evaluation deep models, representation learning general.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Paraphrase Substutution for Recognizing Textual Entailment

We describe a method for recognizing textual entailment that uses the length of the longest common subsequence (LCS) between two texts as its decision criterion. Rather than requiring strict word matching in the common subsequences, we perform a flexible match using automatically generated paraphrases. We find that the use of paraphrases over strict word matches represents an average F-measure ...

متن کامل

Paraphrase and Textual Entailment Generation

One particular information can be conveyed by many different sentences. This variety concerns the choice of vocabulary and style as well as the level of detail (from laconism or succinctness to total verbosity). Although verbosity in written texts is considered bad style, generated verbosity can help natural language processing (NLP) systems to fill in the implicit knowledge. The paper presents...

متن کامل

Paraphrase Substitution for Recognizing Textual Entailment

We describe a method for recognizing textual entailment that uses the length of the longest common subsequence (LCS) between two texts as its decision criterion. Rather than requiring strict word matching in the common subsequences, we perform a flexible match using automatically generated paraphrases. We find that the use of paraphrases over strict word matches represents an average F-measure ...

متن کامل

Paraphrase and Textual Entailment Generation in Czech

Paraphrase and textual entailment generation can support natural language processing (NLP) tasks that simulate text understanding, e.g., text summarization, plagiarism detection, or question answering. A paraphrase, i.e., a sentence with the same meaning, conveys a certain piece of information with new words and new syntactic structures. Textual entailment, i.e., an inference that humans will j...

متن کامل

Paraphrase and Textual Entailment Recognition and Generation

Paraphrasing methods recognize, generate, or extract phrases, sentences, or longer natural language expressions that convey almost the same information. Textual entailment methods, on the other hand, recognize, generate, or extract pairs of natural language expressions, such that a human who reads (and trusts) the first element of a pair would most likely infer that the other element is also tr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Cognitive technologies

سال: 2022

ISSN: ['2197-6635', '1611-2482']

DOI: https://doi.org/10.1007/978-3-031-17258-8_27